Convolutional Neural Networks for Authorship Attribution of Short Texts
نویسندگان
چکیده
We present a model to perform authorship attribution of tweets using Convolutional Neural Networks (CNNs) over character n-grams. We also present a strategy that improves model interpretability by estimating the importance of input text fragments in the predicted classification. The experimental evaluation shows that text CNNs perform competitively and are able to outperform previous methods.
منابع مشابه
Authorship Attribution with Convolutional Neural Networks and POS-Eliding
We use a convolutional neural network to perform authorship identification on a very homogeneous dataset of scientific publications. In order to investigate the effect of domain biases, we obscure words below a certain frequency threshold, retaining only their POS-tags. This procedure improves test performance due to better generalization on unseen data. Using our method, we are able to predict...
متن کاملCharacter-level and Multi-channel Convolutional Neural Networks for Large-scale Authorship Attribution
Convolutional neural networks (CNNs) have demonstrated superior capability for extracting information from raw signals in computer vision. Recently, characterlevel and multi-channel CNNs have exhibited excellent performance for sentence classification tasks. We apply CNNs to large-scale authorship attribution, which aims to determine an unknown text’s author among many candidate authors, motiva...
متن کاملSequential Short-Text Classification with Recurrent and Convolutional Neural Networks
Recent approaches based on artificial neural networks (ANNs) have shown promising results for short-text classification. However, many short texts occur in sequences (e.g., sentences in a document or utterances in a dialog), and most existing ANN-based systems do not leverage the preceding short texts when classifying a subsequent one. In this work, we present a model based on recurrent neural ...
متن کاملCombination of Convolutional and Recurrent Neural Network for Sentiment Analysis of Short Texts
Sentiment analysis of short texts is challenging because of the limited contextual information they usually contain. In recent years, deep learning models such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs) have been applied to text sentiment analysis with comparatively remarkable results. In this paper, we describe a jointed CNN and RNN architecture, taking advant...
متن کاملPREPRINT VERSION An agent-driven semantical identifier using radial basis neural networks and reinforcement learning
Due to the huge availability of documents in digital form, and the deception possibility raise bound to the essence of digital documents and the way they are spread, the authorship attribution problem has constantly increased its relevance. Nowadays, authorship attribution, for both information retrieval and analysis, has gained great importance in the context of security, trust and copyright p...
متن کامل